Uncovering nasopharyngeal carcinoma from chronic rhinosinusitis and healthy subjects using routine medical tests via machine learning

Nasopharyngeal carcinoma (NPC) is one of the most common types of cancers in South China and Southeast Asia. Clinical data has shown that early detection is essential for improving treatment effectiveness and survival rate. Unfortunately, because the early symptoms of NPC are rather minor and similar to that of diseases such as Chronic Rhinosinusitis (CRS), early detection is a challenge. This paper proposes using machine learning methods to detect NPC using routine medical test data, namely Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN), k-Nearest-Neighbor (KNN) and Logistic Regression (LR). We collected a dataset containing 523 newly diagnosed NPC patients before treatment, 501 newly diagnosed CRS patients before treatment as well as 600 healthy controls. The routine medical test data including age, gender, blood test features, liver function test features, and urine sediment test features. For comparison, we also used data from Epstein-Barr Virus (EBV) antibody tests, which is a specialized test not included among routine medical tests. In our first test, all four methods were tested on classifying NPC vs CRS vs controls; RF gives the best overall performance. Using only routine medical test data, it gives an accuracy of 83.1%, outperforming LR by 12%. In our second test, using only routine medical test data, when classifying NPC vs non-NPC (i.e. CRS or controls), RF achieves an accuracy of 88.2%. In our third test, when classifying NPC vs. controls, RF using only routine test data achieves an accuracy significantly better than RF using only EBV antibody data. Finally, in our last test, RF trained with NPC vs controls, using routine test data only, continued to perform well on an entirely separate dataset. This is a promising result because preliminary NPC detection using routine medical data is easy and inexpensive to implement. We believe this approach will play an important role in the detection and treatment of NPC in the future.

Introduction Nasopharyngeal carcinoma (NPC) is a malignant tumor in nasopharyngeal epithelial cells, with unique geographic and ethnic distributions. It is reported that NPC often occurs in East and Southeast Asia [1], especially in Guangdong, China. The incidence rate is about 25 cases per 100,000 people, which is 25 times higher than in other regions of the world. NPC has posed a serious challenge to public health [2].
The incidence area of NPC is mainly at the top of the nasopharynx and the pharyngeal recesses on both sides. Being sheltered behind other tissues and organs, the location of the lesion is difficult to find. Moreover, early symptoms are not obvious. Therefore, it is difficult to distinguish the onset of NPC from other benign disorders such as sinusitis and rhinitis. By the time it is detected, 70-80% of NPC patients are already in a middle or advanced stage. Through the cervical lymph nodes, NPC may metastasize to distant parts of the body, greatly increasing mortality [3,4]. At present, the preferred treatment for NPC is radiotherapy, followed by chemotherapy. The prognosis of NPC is closely related to the stage at which it is detected. The consequences of late detection can be fatal. Studies have shown that the 5-year overall survival rate of patients with stage I-II NPC after radiotherapy is as high as 90.4%. For patients with stage III-IV NPC, the 5-year overall survival rate is decreased by more than 15% for each stage [5]. Moreover, systemic and local adverse reactions caused by radiotherapy and chemotherapy, such as radiation-induced oral mucositis, dry mouth, limited mouth opening, cognitive impairment, and sinusitis, etc., will seriously affect the quality of life of patients [6]. Therefore, screening of high-risk groups and early detection is very important.
The pathogenesis of NPC is still unclear. Current research suggests NPC is caused by three categories of factors: Epstein-Barr Virus (EBV) infection [7][8][9], environmental factors (especially the consumption of Guangdong pickled fish) [10][11][12], and genetic factors [13,14]. Of these three, the most relevant to this paper is EBV infection. EBV infected cells express different proteins in the incubation period and the lysis period. In the incubation period, infected cells mainly synthesize core antigen and latent membrane protein; in the lysis period, infected cells primarily synthesize early membrane antigen, early intracellular antigen, and capsid antigen. Therefore, NPC patients have specific antibodies against the EBV. Henle et al [15] found as early as 1976 that the serum of NPC patients has a significantly higher level of EBV antibodies than that of people without NPC. Thus, serological detection of specific antibodies against the EBV is a useful means for detecting NPC. The anti-EBV specific antibodies currently used in clinical nasopharyngeal carcinoma detection include: VCA-IgA, EA-IgA, EBNA-IgA, EBV DNA enzyme antibodies, etc. [16]. The tests for EBV VCA-IgA and EA-IgA are the most common and mature. Cheng et al [17] collected data from 121 newly diagnosed NPC patients before treatment and 332 healthy subjects and found that the sensitivity and specificity of single VCA-IgA were 93% and 87%, respectively. Liu et al [18] evaluated the value of EBV-DNA, EA-IgA, VCA-IgA, EBNA1-IgA, and RTA-IgG in the detection of NPC. Their study included 8382 NPC patients and 15,089 healthy subjects. They found that the sensitivity and specificity of EA-IgA and VCA-IgA were 55%, 96%, 85%, and 89%, respectively. It is worth noting that VCA-IgA has a high detection rate in healthy people; hence its specificity is lower. On the other hand, EA-IgA has strong specificity, but low sensitivity. Consequently, EBV antibody tests are usually not included in routine medical tests and thus most people miss their chance at early detection of NPC.
In recent years, artificial intelligence (AI) has become an important tool for medical diagnosis. As an essential branch of AI, machine learning has been widely used in the construction of medical diagnosis models. Nevertheless, few studies have been conducted thus far to examine the validity of using machine learning in detecting NPC.
It is known that medical data is very complex. It is difficult to find relationships in the data by manual inspection. Machine learning can make full use of complex medical data, finding hidden patterns to achieve more accurate and efficient diagnosis while reducing the workload of doctors. Zou et al [19] used decision trees, Random Forests (RF), and Artificial Neural Network (ANN) to predict diabetes. The RF method performed best, with accuracy, sensitivity and specificity of 89.63%, 92.26%, and 87.00%, respectively. Oh et al [20] proposed to use deep learning network for early detection of Parkinson's disease. It achieved a promising performance of 88.25% accuracy, 84.71% sensitivity, and 91.77% specificity. Alickovic and Subasi [21] used genetic algorithm-based feature selection to find the most informative features for breast cancer diagnosis, and used different machine learning algorithms to distinguish between benign and malign tumor in breast cancer, including Logistic Regression (LR), Decision Trees, RF, Bayesian Network, Multilayer Perceptron (MLP), Radial Basis Function Networks (RBFN), SVM and Rotation Forest. It is observed that the Rotation Forest achieved the highest classification accuracy of 99.48%. Sharma et al [22] presented a comparative study on the detection of breast cancer using different machine learning algorithms including RF, k-Nearest-Neighbor (KNN) and Naïve Bayes. Their results showed that KNN had the best accuracy, precision and F1 score over the other algorithms. Wen et al [23] indicate that a multianalyte biomarker panel is clinically useful during health check-ups for the screening of tumors such as hepatocellular carcinoma (HCC) and prostate malignancies. Their biomarker panel consisted of eight molecules: α-fetoprotein, carcinoembryonic antigen, prostate-specific antigen, CA19-9, CA125, CA15-3, squamous cell specific antigen, and cytokeratin 19 fragment. Wang et al [24] combined multiple serum tumor markers to detect various cancers using machine learning methods such as SVM and k-nearest neighbor. They found that these machine-learning methods outperformed the use of individual tumor makers. Wang et al [25] demonstrated machine learning models using many biomarkers are capable of improving early detection of cancer by using a large real world dataset.
The objective of this paper is to study the performance of a selection of machine learning methods for the detection of NPC using routine medical tests. The machine learning methods we use are: Random Forest, Support Vector Machine, Artificial Neural Network and k-Nearest-Neighbor. For comparison with a classical method, we include some performance comparisons with Logistic Regression.

Data collection and processing scheme
The data are collected from two hospitals: the Guangdong Provincial Hospital of Traditional Chinese Medicine-University Town Hospital and the Guangdong Provincial Hospital of Traditional Chinese Medicine-Main Hospital. Our main dataset contains a total of 1624 people recorded in the hospitals from 2013 to 2020 including 523 newly diagnosed NPC patients before treatment, 501 newly diagnosed Chronic Rhinosinusitis (CRS) patients before treatment and 600 healthy controls. The controls were randomly selected from 6873 people who came to the hospitals for routine medical checkups and were found to be free from NPC and other chronic diseases. In addition, we collected a secondary, smaller dataset consisting of 101 newly diagnosed NPC patients prior to treatment, who visited the hospitals between March 1 and Nov 30, 2021, and 100 healthy controls, who visited the hospitals during the same period.
We could identify individual participants during or after data collection. Diagnosis of CRS followed the Chinese CRS Diagnosis and Treatment Instruction (2021 Kunming version), which is a modified version of an European position paper on rhinosinusitis and nasal polyps (EPOS) [26]. All CRS cases were confirmed by pathology testing.
Diagnosis of NPC follows the TNM staging system, which considers the degree of local invasion of the primary tumor (T), the extent of regional lymph node metastasis (N), and the presence of distant metastasis (M) [27]. The current clinical staging standard is the UICC/ AJCC 8 th edition / China 2017 edition, as defined below.
a. Stage I (TNM classification: T1N0M0): The lesion was confined to the nasopharynx.
b. Stage II (TNM classification: T2N1M0): The tumor invaded the surrounding soft tissue and the whole nasal cavity, with single lymph node metastasis less than 6 cm in diameter and above the supraclavicular fossa.
c. Stage III (TNM classification: T3N2M0): The tumor invaded the skull base, with bilateral lymph node metastasis less than 6cm in diameter and above the supraclavicular fossa.
d. Stage IV (TNM classification: T4N3M0): The tumor invaded the intracranial and cranial nerves and the orbit. Lymph node diameter is greater than 6 cm and there is supraclavicular fossa lymph node metastasis.
All NPC cases were confirmed by pathology testing. The clinical stages of the 523 NPC patients ranged from stage II to IV, of which 40 were stage II, 256 were stage III, and 227 were stage IV. The lack of stage I patients and relatively low number of stage II patients was because NPC is rarely detected early; thus data on early stage patients is scarce. Fig 1A shows the MRI image of a Stage II NPC patient and Fig 1B shows the MRI image of a CRS patient. The question is: Can we distinguish NPC from CRS using routine medical test data.
Each subject in the dataset contains four categories of information, (1) demographic features (gender and age), (2) whole blood test feature indices (testing equipment: Mindray

PLOS ONE
Detection of nasopharyngeal carcinoma using routine medical tests via machine learning BC-6800 PLUS/6900 Whole Blood Cell analyzer), (3) liver function test feature indices (testing equipment: Roche Cobas 8000 analyzer), and (4) urine sediment test feature indices (testing equipment: Roche U601 semi-automatic urine dry chemistry analyzer). These are all considered routine medical data. Detailed information on the features is shown in Table 1. Note: The number in parentheses are the percentages. It shall be mentioned that some subjects also contained EBV antibody data, specifically VCA-IgA and EA-IgA. It was collected using a YHLO iFlash300-A chemiluminescence analyzer. It was only used in Test 3.
Some data pre-processing was applied. We used Label 0 to represent controls, Label 1 to represent NPC patients, and Label 2 to represent CRS patients. Gender was encoded as a binary variable while age was encoded as an integer-valued variable, accurate to the year. Then, univariate variance analysis was performed to assess the significance of the association between feature indices and the labels. If the P-value after Bonferroni correction with a threshold α = 0.05 is greater than 0.05, then the feature was deemed insignificant and excluded. Consequently, the following features indices were excluded: EOSIN, RBC, ALT, DBIL, Urea, PRO and NTT. Consequently, a 24-dimensional feature vector was constructed for modeling.

Methods
We used five machine learning methods: RF, SVM, ANN, LR and KNN. These methods are well-established, but use fundamentally different models and hence provide different views for a problem. The RF was built with Python using the function "RandomForestClassifier" in the package "sklearn.ensemble". The tree building parameters are searched using the Python function "GridSearchCV" in the package "sklearn.model_selection". The SVM was built using the function "SVC" in the Python package "sklearn.svm" with the radial basis function kernel function Kðx; x 0 Þ ¼ e À gkxÀ x 0 k . The model parameters are searched using "GridSearchCV". The ANN was built using "Keras" in Python. The network consists of two hidden layers, with 64 nodes in the Hidden Layer 1 and 16 nodes in Hidden Layer 2. We use the rectified linear unit (relu) activation function for both hidden layers. The LR was built with using the function "LogisticRegression" in the package "sklearn.linear_model". The KNN was built with using the function "KNeighborsClassifier" in the package "sklearn.neighbors".
For each method, we used the 5-fold stratified cross-validation method to evaluate. First, the dataset was randomly partitioned into 5 subsets of approximately equal size, each with class distribution approximately equal to that of the whole dataset. Then, the union of the 4 subsets was used as the training set and the other subset was used to evaluate the performance. The mean of the five such testing results is used as the outcome. The evaluation indices include: Precision, Recall, Accuracy, Area Under receiver operating characteristic Curve (AUC), and Matthews Correlation Coefficient (MCC).

Ethics declarations
Ethics approval was obtained from the Ethics Committee of the Guangdong Provincial Hospital of Chinese Medicine (reference number ZE2021-148-01). All research was performed in accordance with the Declaration of Helsinki 2013. This is a retrospective study with a large number of participants. Informed consent was obtained from the participant. In case the

PLOS ONE
Detection of nasopharyngeal carcinoma using routine medical tests via machine learning participants were not informed and their contact information were lost, waiver of informed consent was approved from the ethics committee.

Results
A total of four tests were conducted. In Tests 1 and 2, all five methods were tested. For all subsequent tests, we narrowed our attention to RF. Moreover, Tests 1, 2 and 3 used the primary dataset while Test 4 used secondary dataset.

Test 1-Classification of NPC, CRS and controls
This test was designed to examine whether the machine learning methods could distinguish NPC, CRS and healthy controls as well as which methods performed better. Fig 2 shows the confusion matrices, in which the numbers are the mean of the 5 runs of the 5-fold stratified cross-validation. Table 2 summarizes the performances of the five methods. From the table, it is seen that RF has the best performance achieving 83.1% accuracy with 95% CI above 80%. It also has the best precision, recall and AUC. Moreover, its accuracy outperforms that of the classical LR method by 12%. This indicates that RF is effective. Table 3 shows the performance of RF in more detail. A close examination on

Test 2-Detection of NPC
This test is designed to evaluate the ability of machine learning methods in detecting NPC. Two sub-tests are conducted: one is NPC patients vs a mix of 50% of the CRS patients (250 / 501) and 50% of the controls (300 / 600). The other is NPC patients vs CRS patients only. Table 4 summarizes the performance of five machine learning methods. From the table, it is seen that RF has the best performance achieving 88.2% accuracy again. Hence, we focus on the use of RF in the subsequent discussions. Table 5 shows the performance of RF in distinguishing NPC patients from CRS patients. From the table, it is seen that the precision in detecting NPC patients is 82.7% and the precision in detecting CRS patients is 85.3%. The AUC value is 0.82.

PLOS ONE
Detection of nasopharyngeal carcinoma using routine medical tests via machine learning Figs 4 and 5 show the importance ranking of the features given by RF. Table 6 lists the top 8 most important features and their weights in these two tests. From the table, it is seen that the first 4 are the same. Moreover, they are also similar to that of Test 1. This is a promising result because it shows that these features are not merely markers of general ill health, but can distinguish NPC specifically from CRS.

Test 3-A comparison to EBV testing
As pointed out in Section 1, EBV antibody tests are effective in detecting NPC. Our third test was to compare the effectiveness of using routine medical test data to detect NPC with the  effectiveness of using EBV test data, all with RF. Three sub-tests were conducted in which RF was trained to distinguish NPC patients vs controls; they differed in which features were given to RF: routine medical testing data only, EBV antibody data only, and both. Table 7 shows the classification performances. From the table, we see that RF with routine medical data performs better than RF with EBV data only. Using both routine medical testing and EBV data results in even better accuracy. AUC and MCC figures indicate the same. For this test, we examined the incorrectly classified subjects in more detail. Table 8 shows the number of subjects classified incorrectly and its percentage. False negative cases, i.e. NPC patients classified as healthy, are further broken down by their NPC stage.

Test 4 -Testing RF on the secondary dataset
Finally, to evaluate the robustness of RF's learning, we applied RF, with the forest trained on routine medical features only using the primary dataset, to the secondary dataset.  Table 7). This result is consistent with the findings in Test 2 as well (refer to Table 4).

Discussions
In this study, we evaluated the performance of machine learning methods, particularly RF, for detection of NPC. We used a main sample of 1624 subjects, of which 523 were newly  Test 3 evaluated the performance of RF at distinguishing NPC vs controls. There were three sub-tests, which differed in which features were given to RF. When RF was given only routine medical data, it achieved an accuracy of 95.0% and an AUC of 0.986. When RF was given only EBV antibodies data, it achieved an accuracy of 90.4% and an AUC of 0.928. When RF was given both, it achieved an accuracy of 96.9% and an AUC of 0.990. This is a promising result because it shows NPC may be accurately detected using only routine medical data, reducing the need for costly EBV tests. The false negative rate of 7.5% for stage II NPC patients is also a promising result because it suggests RF with routine medical data may be effective at detecting even early stage NPC, improving the patient's chances of survival.
Finally, Test 4 evaluated the performance of the forest trained on routine medical features only in Test 3, applied to classifying subjects in the secondary dataset. It achieved an accuracy of 91.9% and the AUC is 0.975.
We acknowledge some limitations of our study. Our dataset consisted of only 1824 (1624 in the primary dataset and 201 in the secondary dataset) subjects and only 3 classes: NPC, CRS and healthy controls. Ideally, our dataset should not only include a larger total number of subjects, but also a wider variety of health conditions such as other types of cancers. Additionally, the study included no Stage I NPC patient and only 40 Stage II patients. Thus, while the performance of RF at detecting even Stage II NPC patients in Test 3 was promising, a larger study with a more early stage NPC patients is needed to confirm and extend this result.

Conclusions and future prospects
This paper studies the performance of machine learning methods, particularly Random Forest (RF), at detecting Nasopharyngeal Carcinoma (NPC), using routine medical test data. We believe such methods can play an important role in the future. It can be easily implemented without much additional cost. Its result can serve as a warning; following a positive classification, patients should follow up with a definitive check such as MRI and pathological testing. Further research should confirm whether they are effective even at detecting early-stage NPC. This idea may also be used to study detection of other types of cancers.